A scalable architecture for multilingual speech recognition on embedded devices
نویسندگان
چکیده
In-car infotainment and navigation devices are typical examples where speech based interfaces are successfully applied. While classical applications are monolingual, such as voice commands or monolingual destination input, the trend goes towards multilingual applications. Examples are music player control or multilingual destination input. As soon as more languages are considered the training and decoding complexity of the speech recognizer increases. For large multilingual systems, some kind of parameter tying is needed to keep the decoding task feasible on embedded systems with limited resources. A traditional technique for this is to use a semi-continuous Hidden Markov Model as the acoustic model. The monolingual codebook on which such a system relies is not appropriate for multilingual recognition. We introduce Multilingual Weighted Codebooks that give good results with low decoding complexity. These codebooks depend on the actual language combination and increase the training complexity. Therefore an algorithm is needed that can reduce the training complexity. Our first proposal are mathematically motivated projections between Hidden Markov Models defined in Gaussian spaces. Although theoretically optimal, these projections were difficult to employ directly in speech decoders. We found approximated projections to be most effective for practical application, giving good performance without requiring major modifications to the common speech recognizer architecture. With a combination of the Multilingual Weighted Codebooks and Gaussian Mixture Model projections we create an efficient and scalable architecture for non-native speech recognition. Our new architecture offers a solution to the combinatoric problems of training and decoding for multiple languages. It builds new multilingual systems in only 0.002% of the time of a traditional HMM training, and achieves comparable performance on foreign languages.
منابع مشابه
Online generation of acoustic models for multilingual speech recognition
Our goal is to provide a multilingual speech based Human Machine Interface for in-car infotainment and navigation systems. The multilinguality is for example needed for music player control via speech as artist and song names in the globalized music market come from many languages. Another frequent use case is the input of foreign navigation destinations via speech. In this paper we propose app...
متن کاملReal world approaches for multilingual and non-native speech recognition
This thesis proposes a scalable architecture for multilingual speech recognition on embedded devices. In theory multiple languages can be recognized just as one language. However, current state of the art speech recognition systems are based on statistical models with many parameters. Extending such models to multiple languages requires more resources. Therefore a lot of research in the area of...
متن کاملSpeaker- and language-independent speech recognition in mobile communication systems
In this paper, we investigate the technical challenges that are faced when making a transition from the speaker-dependent to speakerindependent speech recognition technology in mobile communication devices. Due to globalization as well as the international nature of the markets and the future applications, speaker independence implies the development and use of languageindependent ASR to avoid ...
متن کاملLucent automatic speech recognition: a speech recognition engine for internet and telephony srvice applications
Based on Bell Labs speech recognition and understanding technology, we developed LASR3 (Lucent Automatic Speech Recognition, Version 3), a speaker independent, software-based continuous speech recognition engine. It is compatible with Microsoft Speech Application Programming Interface (MS SAPI)[1]. LASR3 provides support for desktop, telephony, and internet applications requiring speech recogni...
متن کاملMobile Speech Recognition
This chapter gives an overview of the main architectures for enabling speech recognition on embedded devices. Starting with a short overview of speech recognition, an overview of the main challenges for the use on embedded devices is given. Each of the architectures has its own characteristic problems and features. This chapter gives a solid basis for the selection of an architecture that is mo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Speech Communication
دوره 53 شماره
صفحات -
تاریخ انتشار 2011